Research on Cassandra Data Compaction Strategies for Time-Series Data

نویسندگان

  • Bai Lu
  • Yang Xiaohui
چکیده

Storage and analysis of time-series data is a subject of intense interest in the current international database research field. Time series data, a sequence of collected data information points by fixing time interval, is an important basis to proceed business analysis and prediction in the future. As an excellent NoSQL database, Cassandra is often used to storage time-series data because of its characteristics of data model. In the scene of real application, time-series data used to proceed the management of data life cycle by setting up TTL; the real delete operation would not be executed immediately, while unnecessary data will be deleted during the compaction course. This paper focuses on the issue of the effect of different strategies for time-series data storage and the research on three Cassandra storage strategies: Size-Tiered Compaction Strategy, Leveled Compaction Strategy and Date-Tiered Compaction Strategy; and comparative test based on stable data storage, recording speed sorted string tables file numbers and so on. Finally, the compaction strategies suitable for time-series data application scenarios are obtained by carrying on experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bigtable Merge Compaction

We initiate the formal study of the online stack-compaction policies used by big-data NoSQL databases such as Google Bigtable, Hadoop HBase, and Apache Cassandra. We propose a deterministic policy, show that it is optimally competitive, benchmark it against Bigtable’s default policy, and suggest five interesting open problems.

متن کامل

K-Slot SSTable Stack Compaction

We initiate the formal study of the online stack-compaction policies used by big-data NoSQL databases such as Google Bigtable, Hadoop HBase, and Apache Cassandra. We propose a deterministic policy, show that it is optimally competitive, benchmark it against Bigtable’s default policy, and suggest five interesting open problems.

متن کامل

Lightweight Indexing for Log-Structured Key-Value Stores

The recent shift towards write-intensive workload on big data (e.g., financial trading, social user-generated data streams) has pushed the proliferation of log-structured key-value stores, represented by Google’s BigTable [1], Apache HBase [2] and Cassandra [3]. While providing key-based data access with a Put/Get interface, these key-value stores do not support valuebased access methods, which...

متن کامل

On the Detection of Trends in Time Series of Functional Data

A sequence of functions (curves) collected over time is called a functional time series. Functional time series analysis is one of the popular research areas in which statistics from such data are frequently observed. The main purpose of the functional time series is to predict and describe random mechanisms that resulted in generating the data. To do so, it is needed to decompose functional ti...

متن کامل

Fitting of Count Time Series Models on the Number of Patients Referred to Addiction Treatment Centers in Semnan County

Abstract. Count data over time are observed in many application areas. Many researchers use time series patterns to analyze this data. In this paper, the poisson count time series linear models and negative binomials on this type of data with the explanatory variables are studied. The Likelihood analysis and the evaluation of count time series model based on generalized linear models are pres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JCP

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2016